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Abstract 

Wikipedia (WP) as a collaborative, dynamical system of humans is an appropriate subject of social 
studies. Each single action of the members of this society, i.e. editors, is well recorded and accessible. 
Using the cumulative data of 34 Wikipedias in different languages, we try to characterize and find the 
universalities and differences in temporal activity patterns of editors. Based on this data, we estimate the 
geographical distribution of editors for each WP in the globe. Furthermore we also clarify the differences 
among different groups of WPs, which originate in the variance of cultural and social features of the 
communities of editors. 

Introduction 

Relying on the data gathered by recently developed information and communication technologies (ICT), 
studies on social systems has entered into a new era, in which one is able to track and analyze the 
behavior of a large number of individuals and the interaction between them in details. Among all 
examples, recent investigations based on cell phone records (calls [I] and text messages [2]) and web- 
based societies and media (web-pages [5], movie, news and status sharing sites e.g., YouTube.com U, 
digg.com [5] and twitter.com [6]) have opened very interesting insights into features of collective and 
cooperative dynamics of human systems. Wikipedia (WP) as a free, web-base encyclopedia, which is 
entirely written and edited by voluntaries from all around the world, has also attracted attention of 
many researchers recently [THIO] (for a recent review, see [H]). To study WP, understand and model its 
evolution [7], coverage [12], conflicts or editorial wars p^[T4] . user reputation [15] and many other issues. 
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we should obtain basic information about the community of its editors, i.e., their age, education level, 
nationality, individual editorial patterns, fields of interest and many other aspects. Yet, there has been 
rare systematic and unbiased studies in this direction. The main barrier here is the privacy issues, which 
prohibit any attempt to obtain personal data of committed editors. 

There are two ways of contributing to Wikipedia. The first way is editing as an unregistered user; in 
this case all the edits are recognized by the IP address of the editor, and therefore it becomes easy to 
locate the editor and collect some geographical information about him/her. But most of the editors take 
a second way which is editing under a registered user name, which hides the real world identity and IP 
address of the editors and therefore is a much more secure way of contributing. Moreover, contributions of 
such serious editors are identified and unified under one single nickname, irrespective of which IP address 
they use to connect to the network and can be counted as a measure of maturity in the promotion 
processes. Cohen has extracted geographical data from IP addresses of unregistered editors of English 
WP, integrated them over time and concluded that about 80% of edits on English WP are originated 
from few English speaking countries with high Internet penetration rate, i.e., 60% from the USA, 12% 
from UK, 7% from Canada and 5% from Australia [1^. However, contributions from unregistered editors 
arc limited to less than 10 percent for many WPs (see Table [T|). Moreover the rather small sample of 
unregistered users, is not representing the features of average users, as will be discussed later. Therefore, 
indirect methods to locate editors or to obtain any kind of information about the community is highly 
desirable. One of our aims is to show that using the temporal patterns of WP users, conclusions about 
the geographical distribution of (registered) editors can be drawn. 

Recently much effort has been devoted to describe and understand the extreme temporal inhomogene- 
ity of human activities, represented by the burstiness of activities and the fat-tailed distribution of time 
intervals between events [T7] . While the circadion and other periodic characteristics of temporal patterns 
of human activities cannot account for the whole richness of bursty behavior |18] , they remain important 
for understanding the entire dynamics of the systems. These regularities are induced by circadian and 
seasonal cycles of the nature |19] on one hand and by cultural aspects on the other one. Consequently, 
studies on diurnal patterns of the Internet traffic have brought interesting information about individual 
habits of the Internet usage in different societies pUlf^ . In this paper our focus is on such cyclic behavior, 
while investigations on other aspects of temporal inhomogeneities like short time bursty behavior and 
inter-event interval distributions are reported elsewhere |22) . 
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West et. al. have tried to make use of diurnal charaeteristies of edits to deteet vandalism and destrue- 
tive edits |23j . Their study was again restrieted to traeking positive and negative edits from unregistered 
editors, for which they found that most of the "offending edits" are committed during the working hours 
and working days compared to after-darks and weekends. In the admin-ship of Wikipedia it is also be- 
coming fashionable to use the personal temporal fingerprint of editors as a side-tool to detect and prevent 
sock-puppetry, although this could only be done with high respect to the privacy policies of Wikimedia 
Foundation^ 

In this work, we first try to characterize the circadian pattern of edits on Wikipedia, by analyzing 
massive data of 34 WPs, then we introduce a novel method to locate and find geographical distribution 
of the editors of large international WPs, e.g., English, simple English, Spanish, etc. Furthermore, 
we analyze the temporal behavior of editors on longer time scales, i.e. weekly patterns and report on 
significant differences between various societies. 



Methods 

This work is carried out on 34 WPs selected from the largest ones in respect to the number of articles, 
i.e., those ones, which have more than 100,000 articles^ Among the sample, number of total edits and 
editors vary between 3 M to 455 M and 46 k to 14 M, respectively. In Tabled] some statistics about the 
WPs under the investigation are reported. 

We considered every single edit performed on each WP and having the timestamps assigned to edits, 
calculated the overall activity of users for the time of day and day of the week. To sec the universality of 
circadian activity patterns among editors of all different languages, we assumed a local time offset for each 
language. Clearly there are some languages which are not spoken only in one country or one time zone, 
e.g., Spanish, Arabic, etc, whereas some others are very localized in a specific time zone, e.g., Italian, 
Hungarian, etc. For the first sort of languages, we initially considered the time offset of the most known 
origin of the language. For the special cases of the English and simple English Wikipedias, initially we 
considered an offset corresponding to USA Central Time. In the ninth column of Table [1] the assigned 

^WP editors are generally expected to edit using only one account. Sock puppetry is the use of multiple accounts to 
deceive other editors, disrupt discussions, distort consensus, avoid sanctions, etc., which is according to WP rules forbidden, 
^http : //wikimediaf oundation . org/wlki/Prlvacy_policy 

''Two Wikipedias of Volapiik and Waray-Waray are excluded from the list due to their small number of speakers and 
Wikipedians and considering that many of articles are robotically generated. The simple English Wikipedia is also included 
in the list, despite it contains only around 70,000 articles. 
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time offset to each language is reported. Note that, due to lack of information such as IP addresses of 
users, this initial assumptions for the origin of edits and corresponding time offset can not be any improved 
at this step. It is one of our goals to implement a method, based on the average behavior of WP editors, 
which is able to determine the percentage of the contributions coming from different geographic units. 
This method will be described in the next section in sequence with the empirical observations. 

Results 

Circadian patterns 

We calculated the normalized number of edits for each of the 34 WPs with the consecutive time windows 
of one hour for the 24 hours of the days. The rational activity level of each time window is calculated by 
dividing the number of edits within the time window by the total number of edits. 

This way the circadian activity patterns are created as depicted in Figure [T] (a). Most WPs show a 
universal pattern; A minimum of activity at around 6 A.M., followed by a rapid increase up to noon. The 
activity shows a slight increase until around 9 P.M., where it start to decrease during night. Qualitatively 
similar shapes are observed for other kind of human activities, e.g. cell phone callings and textings |18) . 
and the Internet instant messaging |24) . 

Deviations 

Among all 34 investigated WPs, there are four, which significantly deviate from all the others in respect 
to the circadian patterns, in Figure. [1] (c) and (d) diurnal activity for these four outliers, Spanish, 
Portuguese, English and simple English WPs are shown. In the case of Spanish and Portuguese, the 
main difference to the rest WPs, is the slight shift to the right (later times). Having in mind that Spain 
and Portugal both use local times which have a larger offset compared to the countries with the same 
longitude, this comes as no surprise. Beside that, the rather large number of speakers from Latin America 
not only is in favor of this shift, but also flatten the overall amplitude of the diurnal pattern (this will be 
discussed later in more details) . And finally the cultural features of those two countries might contribute 
to this observation. 

In the case of the English and simple English WPs, for simplicity, we assumed the reference being 
UTC-6 (which corresponds to the Central Time Zone of the US). Naturally the deviations from the 
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universal pattern are very strong, indicating the complex origin of the English WP. Later we will come 
back to this point. 

To better illustrate the deviation from an average circadian pattern, we calculated the weighted 
average of curves in Figure. [1] (a). Each WPs pattern is weighted by its total number of edits. The 
average curve is depicted in Figure. [1] (b). Now we can calculate the difference from this average pattern 
for each WP, at different times of the day t. According to the shape of T>{t) and by maximizing the 
cross-correlation coefficient, almost all WPs could be categorized in 4 categories as in Figure. [2l Two of 
these categories. Figure. [2] (a) and (c) consist of WPs which have less activity during nights compared to 
the average pattern. 

These WPs are all in such European languages, which are spoken in single, localized regions and 
therefore the minimum of activity of their editors is deeper than others. In Figure. [2] (b), a category 
consisting of Asian languages is shown. These WPs are more active during nights and less active during 
working hours compared to the average. In the last category, shown in Figure. [2] (d), a higher activity 
during night and a lower activity during working hours is a clear sign of a extended distribution of 
contributors from different time zones. Arabic, Persian, Chinese are from this category in addition to 
Spanish. Portuguese, English and Simple English (not shown). 

The other way to look at the locality of the languages is to quantify the sleep depth. Sleep depth 
is defined as the difference between the maximum and the minimum of the activity of each language 
users and might be assumed as a measure of the locality of the global distribution of the editors of the 
corresponding language. In the last column of Table [U the calculated depth values are reported. These 
values vary from 2.3 for simple English to 5.6 for Italian. Among those WPs with small sleep depth are 
Arabic, Indonesian, Persian and English. The average sleep depth for the category of Fig. [^d) is 2.8 
with standard deviation of 0.4. 

Among languages with the large sleep depth are Italian, Hungarian, Polish, Catalan, and Dutch. 
These are all languages which are mostly spoken in a narrow area of the world and therefore are very 
localized in time zones. The average sleep depth for the category of Fig. ^c) is 4.9 with standard 
deviation of 0.4. It is also to mention, that although Spanish and Portuguese are both widely spoken 
in different areas and different time zones, but the sleep depth of both lay in the middle range (4.4 and 
4.2 respectively). For a more precise interpretation we try to estimate the share of editors from different 
areas to each WP in the next section. 
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Geographical distribution of editors 

As mentioned above, due to privacy policy issues, there is no access to tire locating information of regis- 
tered editors, such IP addresses. However there are studies only considering contributions by unregistered 
users which give a very rough image of the real distribution of editors in the globe |16| . We aim at a better 
method by decomposing the overall activity pattern of each WP to basic elements, which are assumed 
to be representative for contributions purely originated from a certain time zone. For this purpose, we 
averaged over activity patterns for the 10 WP with the deepest sleep to obtain a smooth curve, which 
has the features of collective activity of users in synchrony (hereafter called Standard curve S{t)). In the 
next step, we assume that the activity pattern of a WP, A{t) with wide spatial distribution of editors can 
be simulated by superpositions of A'^ standard curves with different time shifts Ti and different weights 
Wi for i = 1 to N, 

N 

^(t) =^m,5(f- Ar,) (1) 
1=1 

where Ar^ is the difFcrcncc between Ti and the assumed time offset of the language (see Table. [1]). 

In general, one could minimize the error of the simulated activity pattern for each WP for N ^ 24 
different offsets and find the optimal weighting. Clearly, weights are proportional to the volume of con- 
tributions from each time zone. Following this outline we did the optimization, but in a more supervised 
manner. We restricted N to the number of different time zones, which are relevant candidates for being 
an origin of contribution, e.g., we excluded time zones of nonliving areas of the earth. Furthermore, to 
reduce the complexity of calculations and also avoid multiple solutions, we reduced N to the number of 
areas, which have considerable number of speakers of the language. In many cases, by superposition of N 
between 3 to 6 standard curves, we could fit the empirical data with a high value of correlation coefficient 
between the simulated and imperial data sets (see Figure [3]) , whereas taking larger A^s does not decrease 
the error and it only leads to more zero lu^'s. Finally, by a proper combination of demographic informa- 
tion and optimization techniques, we estimated the share of different regions to 9 different WPs. These 
estimations are summarized in Figure 21 Though in some cases the error function is rather flat around 
its minimum, leading to relative large tolerance in calculated weights, existence of separated multiple 
minimums is prohibited by applying the demographic restrictions. 
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Weekly patterns 

We also considered the activity of editors during weeks and its dependence on the day of the week. These 
results are shown in Figure [S] According to the weekly pattern of activity, we could categorize 28 out of 
34 WPs into 4 different categories which belong to two main categories of "working days" and "weekend" 
activity. In the upper- left panel of Figure EJ those WPs are shown, which have highest activity of editors 
during the working days of the week. Among them, are English, simple English, German, Spanish, 
Portuguese and Italian. In the rest of WPs, a big part of edits are done during weekends. In the class of 
Polish, Dutch, Korean and Japanese WPs (upper-right panel of Figure [5]) equal activities arc shown on 
Saturdays and Sundays, whereas in the class of Danish, Swedish, Norwegian and Finnish WPs, editors 
have very low activity on Saturdays. The last class of the "weekend" WPs, consists of Arabic and 
Persian WPs, in which Fridays arc also active days in addition to Saturdays and Sundays. The latter is 
no surprise, considering that Friday is a public holiday in all of the original countries along with Saturday 
in most of them. 

Discussion 

The novel approach to the collective characteristics of community of editors of WPs, described above, 
enables us, for the first time, to shed light on less studied aspects of Wikipedia. Based on the reported 
results, many basic questions and concerns about the whole projects of Wikimedia can be investigated. 
Knowing the spatial distribution of the editors of a certain WP would be reliable basis for explaining 
specific biases in WP articles, heterogeneous topical coverage and origins of conflicts and editorial wars to 
some good extent. In addition to that, these results arise new questions and puzzles as well. Considering 
the large population of English speakers in North America compared to Europe, and the fact that the 
Internet is most developed in North America, the estimation of around only half share for north America 
to English WP is a puzzle, which definitely needs further multidisciplinary studies. In the case of Simple 
English WP, the European share is even larger, which is not surprising, together with the fact that the 
share of Far East increased, since this WP is meant to be of use by non-native speakers (though, not 
necessarily written by them). Note that previous results of |16| and [52] are partially supported by the 
results reported here. For instance, a share of less than 10% for Australian editors in English WP is 
in both articles reported. Unfortunately, there is no explicit focus on the contributions from European 
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countries in the mentioned works, and it seems the large amount of efforts by European editors was 
overlooked. However we have repeated the measurements on IP addresses of unregistered users more 
generally for different WPs by following every single edit from this type to locate the editor. Firstly, 
we constructed the "precise" activity pattern of unregistered users, as shown in Fig. [Hb). The activity 
pattern of unregistered users has clearly deeper minimum at night and higher maximum during working 
hours, compare to most of the other curves. Unregistered users contribute to WP occasionally and 
mostly only with few edits from the same IP address. To be actively editing even at nights, one must be 
extremely committed to WP, therefore the deep sleep of the activity curve of unregistered users comes as 
no surprise. We believe the sample of unregistered users is more representing the activity of WP readers 
who edit rarely as they notice needs to tiny modifications here and there while reading the articles than 
committed users who basically write the main body of the articles. The percentage of contributions by 
unregistered users is measured and reported in Table [T] for all 34 WPs. This value varies between 4 
for Slovenian and 37 for Japanese WP. We compared the results for geographical distribution of editors 
obtained by locating them with the IP addresses to the previous results described above and observed 
that, both methods mainly give similar results for the WPs with rather larger share of unregistered users, 
whereas they deviate for WPs with small share of unregistered users. Finally, one should consider the 
fact that the committed users, sometimes edit without using their registered user name to vandalize or 
edit specific controversial articles without leaving any trace which may cause troubles for the original 
user name. In such cases, most of the time an "open proxy" (with an arbitrary IP address) is used to 
hide the real IP address of the editor. This makes the analysis based on IP addresses even looser. 

Another interesting part of the results is on Persian WP. Although more than 70% of native Persian 
speakers live in Iran and the rest in closely neighboring countries, but the corresponding WP appears 
in the top list of WPs with small sleep depth. In addition, the estimated share for edits from Iran is 
only about 45%. This could be due to the following facts. 1) Strong restrictions on the Internet usage 
have been applied by Iranian government during years as a consequence of socio-political issuej^, which 
makes it difficult to contribute to WP using Iranian based ISPs. 2) Iran has a high rate of immigration 
of students and scholars. That has led to formation of large intellectual communities out of Iran, which 
might be responsible for considerable amount of edits in Persian WP. 

Low level of contribution to the French WP by North American editors and to the Arabic WP 

^Wikipcdia is one of the few remaining unbanned Web. 2.0 web sites currently in Iran. 
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by Egyptian editors, could have roots in the differences between the spolcen dialects and the standard 
languages. Though both languages (French and Arabic) are among the official languages in the mentioned 
regions, it seems that the divergence between dialects play an important role to suppress contributing to 
WP. It should be mentioned, that there is a separate WP in the local dialect of Egypt, (Egyptian Arabic 
Wikipedia) and there has been an unsuccessful effort to launch the Canadian French Wikipedia recently. 
Therefore we think that the estimations for contributions could be of interest for the WP community too 
and elaborate the process of decision making for a new WP in a local dialect. 

Clearly, the presented method also has its limitations. For instance, accessing to information about 
the distribution of editors in different longitudes is impossible by only considering the time stamps. 
Moreover the resolution of the regional estimations are not very high. Because of many factors, e.g. 
applying summer time in many countries the method can not claim at a resolution higher than a one 
hour stripe. For example, in the case of English WP, the supervised optimization results in a ratio of 3 
for the weight of GMT+1 over GMT+0, corresponding to Central Europe and Western Europe times. 
But because of the mentioned reasons, distinction between the share of the very close time zones is not 
justifiable. Moreover, in some cases the error of the simulated activity pattern is not very sensitive to 
changes in weights of spatially closed offsets. However, all the results presented above are precise up to 
the last significant digit. 

Putting beside the deviations from the average of daily activities and the weekly activity for all WPs, 
one is able to make very clear conclusions. For example, the daily pattern of Asian languages (e.g., 
Japanese, Chinese and Korean) show higher activity during evenings and nights along with high level 
of activity at weekends. This can be related partly to the lengths of working hours in corresponding 
countries. This general image, which holds partially for Turkey and Russia and Israel too, could be in 
close relation with the high average working hours per day (more than 40 hours in all the mentioned 
casej^) in those countries. Furthermore, among European countries, we also see the same tendency; in 
the countries with rather larger working times, edits are mostly done in later times in evenings. 

It is to mention that same analysis have been done for the seasonal patterns to extract effects of 
changes in daylight timing, but the large fluctuations in average behavior, makes it very difficult to 
conclude relevant results. The only significant large scale seasonal pattern is the reduction of activity 
with approaching to the new year holidays for many WPs. 

^According to the dataset of The Organization for Economic Co-operation and Development: |http : //stats ■ oecd. org] 
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In conclusion, based on a dataset of time stamped edits on different Wikipedias, wc studied the 
diurnal and weekly patterns of activity of editors. We could see a universal circadian pattern for all WPs, 
which has its minimum at dawn and maximum at late afternoon and early evening. According to this 
investigation, we also argued that using a weighted mixture of contributions from different time zones 
and an optimization procedure, we can estimate the different contributions to a WP. In particular, we 
observe that a considerably large part of edits on English and simple English WPs are originated from 
Europe and the share of North America was below expectations. The same type of analysis was also 
performed for other WPs in different languages. In contrast to diurnal pattern, which is universal to a 
great extent, weekly activity patterns of WPs show remarkable differences. We could, however, identify 
two main categories, namely "weekends" and "working days" active WPs. Further studies are needed to 
explain these observations in detail and relate them to cultural and social differences. 
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Figure 1. Normalized activity of editors for (a) all WPs listed in Table [T] excluding English, 
simple English, Spanish and Portuguese, (b) the average curve extracted from curves in (a) and 
standard curve extracted from 10 most localized WPs along with the activity curve of unregistered 
users, whose IP-addresses are known and therefore one is able to locate them and obtain the local time 
zone precisely (c) activity pattern of Spanish (red) and Portuguese (green) WPs, and (d) activity 
pattern of English (red) and simple English (green) WPs. 
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Figure 2. Deviation of activity patterns from the average curve, leading to 4 different 
categories of WPs. The gray dotted line is the average deviation of each category. The sleep depth (for 
the definition, see the text) of for categories a-d are 4.5±0.2, 4.3±0.2, 4.9±0.1 and 2.8±0.2 respectively. 
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Figure 3. Decomposition of activity pattern of English WP into 5 shifted standard curves 
with different weights, the blue line is the empirical data, the yellow curve is the standard curve (see 
the text for the definition), the thin dotted lines are shifted and weighted standard curves, and the red 
dotted line is the linear superposition of them which models the empirical data properly. 



16 




\L North America 
51% iH Europe 

/' L Far East & Australia 



pt 








35% '\ 


60% 




/' 

y' 




40% > 


f \ 






A 





□ South America 

■ Portugal 

^ Southern Africa 



r West-North Africa 
■ Middie East 
;' C North America 
Mid-North Africa 



smp 



\ n North America I 
Europe 

Far East & Australia 



85% 



Mn France & West Africa 
Jjl Central Africa & Madagascar 
/ ||_J North America 



25% 




45% . ^ ^P^'" 

~ South America 



Central America \ 40% 



25% 



30% 



45% \ □ Iran 

\ I Europe 
, / □ North America 




East China 
West China 
Europe 

North America 



^ Israel 
i ■ Europe 
70% ,/ □ North America 



Figure 4. Estimation of users contribution from different regions. By precisely combining the 
outputs of the optimization process, described in tlic text, and demographic data of each language, the 
share of each region to each WP is estimated. For the sake of accuracy in reporting the results, in some 
cases the contributions of regions closely located, are unified. 
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Figure 5. Activity of editors on different days of the week, categorized in 4 subcategories. 

There are two major categories of weekend, (b)-(d) and weekdays (a) activity. Please note that WPs in 
(d) are languages spoken mostly in Muslim countries, which have either ThursdayFriday weekend 
(Saudi Arabia, Oman and Yemen), or Friday Saturday weekend (Algeria, Bahrain, Egypt, Iraq, Jordan, 
Kuwait, Libya, Qatar, Sudan, Syria, United Arab Emirates), or , SaturdaySunday weekend (Morocco, 
Tunisia). In Iran and Afghanistan, Persian spoken countries, only Friday is considered as weekend. 
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Table 1. Wikipedias Statistics 
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1 OS 
IZO 


lo04 


1 n 


1 


/I 

4.0 


An 


Gcrnicin 


Cermany 


1 09. 


101/1 
1Z14 


01 

yi 


IzUO 


0/1 1^1 

Z40 ly 


ID 


1 





cn 


Englisli 




DUU 


oDuy 


A^^ 
400 


1 /I 0/in 
1404U 


1 P^l P^/1Q 

ioi04y 


1 n 


-0 


7 

Z. ( 


eo 


Esperanto 


_t 





1 /I 
14o 


Q 
O 


/10 

4y 


/I (^f? 
400 





_Ll A 

+ 1 


o.O 


es 


Spanisli 


* 


4 fin 


7/1 8 
/ 4o 


/18 

4o 


1 7SQ 

1 ( oy 


1 f^fi/17 
1004 / 


ZD 


-Li ^ 

-hi 


/I /I 

4.4 


Id 






Iran 


1 lU 


1 0/1 
1Z4 





01 7 
zl ( 


1 SQO 

loyu 





o.O 


7 

Z. ( 


n 


Finnish. 


Finland 





ZOO 


1 O 
lU 


1 7^ 
1/0 


zUOU 


ID 



z 


4.0 


fr- 

rr 


French 


France 


1 70 




Oo 


1 0^7 
lUo ^ 


10040 


1 /I 


1 
1 


/I 

4.0 


Inn 

nc 


Hebrew 


Israel 





1 1 
110 


1 1 
1 1 


1/10 
14U 


0000 

zuzu 


oi 




z 


A ^ 
4.0 


nil 


Hungarian 


Hungary 


1 Q 


1 Q7 
lo / 


1 n 

lU 


1 f;7 

10 / 


zUOO 





1 
1 


O.O 


IQ 


Indonesian 


Indonesia 


1 f^n 


1 ^0 

loy 





0/1 7 
z4 ( 


1 SSI 
lool 


Q 


o 
o 


fl 
Z.O 


II 


It ah an 


itaiy 




7on 
/ yu 


/I Q 
4o 


f^OO 
DzU 


S070 

oz ( y 


1 


1 
1 


0.0 


ja 


Japanese 


Japan 


1 Of\ 
iZO 


7/1 
/ 4o 


'^7 


f^l 
OIU 


1 0'^71 
lUO / 1 


o / 




y 


A S 
4.0 


KO 


Korean 


Korea 


( 


1 ^0 

loy 


7 


1/17 
14 ( 


ly 10 


1 /I 




y 


/I 

0.4 


If 
IL 


Lithuanian 


Litliuania 


Q 
O 


1 "ii 

lOl 


o 
o 


A(^ 
40 


AQ7 
4y / 


7 



z 


/L f\ 
4.0 


III 


uuicn 


Netherlands 


on 
zu 


Dol 


Zo 


ool 


OlzO 


1 n 

lU 


1 
1 


^ 1 
0.1 


no 


Norwegian 


Norway 


A 


007 

zy / 


Q 

y 


1 0/1 
iy4 


0/11*^ 
Z4l0 


s 
o 


1 


A 7 
4. / 


pi 


r^oiisn 


Poland 


A A 


70*^ 

/ yo 


07 
Z I 


4Z0 


i^/IO'^ 
04Uo 


1 A 
14 


1 
1 



O.z 


Pt 


Portuguese 




ZoU 


OoU 


Zo 


oOz 


P^770 
( ( u 


zu 


0^ 


/I 
4.Z 


ro 


Romanian 


— 5 '• 

Ivomania 


07 

Z ( 


1 

loo 


c: 



1 81 
lol 


IZOO 


7 



z 


O.O 


ru 


Russian 


Russia 


277 


699 


35 


652 


12841 


16 


3 


4.3 


simple 


sim.EngUsh 






69 


2 


176 


746 


13 


-6+ 


2.3 


sk 


Slovak 


Slovakia 


5 


122 


3 


58 


612 


7 


1 


3.7 


si 


Slovenian 


Slovenia 


1 


109 


2 


78 


579 


4 


1 


3.8 


sr 


Serbian 


Serbia 


11 


141 


4 


81 


672 


5 


1 


3.5 


sv 


Swedish 


Sweden 


9 


392 


14 


221 


3467 


14 


1 


4.5 


tr 


Turkish 


Turkey 


75 


158 


9 


337 


2499 


24 


2 


4.5 


Ilk 


Ukrainian 


Ukraine 


37 


274 


6 


99 


1929 


5 


2 


4 


vi 


Vietnamese 


Vietnam 


86 


201 


4 


223 


1156 


8 


7 


2.7 


zh 


Chinese 


China 


1300 


351 


16 


984 


5696 


13 


8 


3.7 



*For the languages which are widely spoken in the world, tlie origin country is not well-defined. ^Esperanto has 
never been an official language of any country. '"Egypt (the most populated Arab country) time zone. ^USA 
Central standard time zone. * Central European time zone. Spain time zone. ^Portugal time zone. 



Statistics about WPs under investigation. Name of the WP, language, the most populated country, in 
which the language is spoken, and total number of speakers in the world (millions) are reported in 
columns 1 to 4, followed by number of articles (thousands) in the WP, number of edits (millions), 
number of users (thousands), number of active users (users which have edited in the last month), and 
the percentage of edits by unregistered users (known by their IP-addresses) to the all edits. Two last 
columns consist of the assigned UTC offset to each WP and the Sleep Depth respectively. The 
demographic data is taken from Wikipedia and supposed to give an impression to the reader. In the 
paper, there is not any analysis based on this data. 



